Performance Comparison of Training Algorithms for Semi-Supervised Discriminative Language Modeling

نویسندگان

  • Erinç Dikici
  • Arda Çelebi
  • Murat Saraclar
چکیده

Discriminative language modeling (DLM) has been shown to improve the accuracy of automatic speech recognition (ASR) systems, but it requires large amounts of both acoustic and text data for training. One way to overcome this is to use simulated hypotheses instead of real hypotheses for training, which is called semisupervised training. In this study, we compare six different perceptron algorithms with the semi-supervised training approach. We formulate the DLM both as a structured prediction and a reranking problem, optimizing different criteria in each. We find that ranking variants achieve similar or better word error rate (WER) reduction with respect to structured perceptrons when used with real, simulated, or a combination of such data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Risk-Based Semi-Supervised Discriminative Language Modeling for Broadcast Transcription

This paper describes a new method for semi-supervised discriminative language modeling, which is designed to improve the robustness of a discriminative language model (LM) obtained from manually transcribed (labeled) data. The discriminative LM is implemented as a log-linear model, which employs a set of linguistic features derived from word or phoneme sequences. The proposed semi-supervised di...

متن کامل

Unsupervised training methods for discriminative language modeling

Discriminative language modeling (DLM) aims to choose the most accurate word sequence by reranking the alternatives output by the automatic speech recognizer (ASR). The conventional (supervised) way of training a DLM requires a large amount of acoustic recordings together with their manual reference transcriptions. These transcriptions are used to determine the target ranks of the ASR outputs, ...

متن کامل

Semi-Supervised Discriminative Language Modeling with Out-of-Domain Text Data

One way to improve the accuracy of automatic speech recognition (ASR) is to use discriminative language modeling (DLM), which enhances discrimination by learning where the ASR hypotheses deviate from the uttered sentences. However, DLM requires large amounts of ASR output to train. Instead, we can simulate the output of an ASR system, in which case the training becomes semisupervised. The advan...

متن کامل

A Comparison of Discriminative EM-Based Semi-Supervised Learning algorithms on Agreement/Disagreement classification

Recently, semi-supervised learning has been an active research topic in the natural language processing community, to save effort in hand-labeling for data-driven learning and to exploit a large amount of readily available unlabeled text. In this paper, we apply EM-based semi-supervised learning algorithms such as traditional EM, co-EM, and cross validation EM to the task of agreement/disagreem...

متن کامل

Lightly supervised training for risk-based discriminative language models

We propose a lightly supervised training method for a discriminative language model (DLM) based on risk minimization criteria. In lightly supervised training, pseudo labels generated by automatic speech recognition (ASR) are used as references. However, as these labels usually include recognition errors, the discriminative models estimated from such faulty reference labels may degrade ASR perfo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012